Building the Croatian National Corpus
نویسنده
چکیده
$EVWUDFW 7KH SDSHU SUHVHQWV WKH ZRUN EHLQJ GRQH VR IDU RQ WKH EXLOGLQJ RI WKH &URDWLDQ 1DWLRQDO &RUSXV +1. ,W V EHLQJ FROOHFWHG VLQFH DW WKH ,QVWLWXWH RI /LQJXLVWLFV )DFXOW\ RI 3KLORVRSK\ 8QLYHUVLW\ RI =DJUHE 7KH VL]H WLPH VSDQ LWV FRPSRVLWLRQ DQG FULWHULD IRU WH[W VHOHFWLRQ DUH EHLQJ SUHVHQWHG 7KH +1. FRQVLVWV RI WZR SDUWV PLOOLRQ FRUSXV RI FRQWHPSRUDU\ &URDWLDQ ODQJXDJH &URDWLDQ (OHFWURQLF 7H[WXDO $UFKLYH 7KH SURFHGXUHV RI WKH FRUSXV PDUN XS DQG SURFHVVLQJ DUH EHLQJ GLVFXVVHG 2QH RI WKH PRVW LQWHUHVWLQJ IHDWXUHV RI WKLV FRUSXV VLQFH LWV ODXQFK LQ LV LWV DYDLODELOLW\ IRU TXHU\LQJ WKURXJK WKH ::: 7KH IXWXUH GLUHFWLRQV RI P FRUSXV HQODUJHPHQW WR P LQ QH[W IHZ \HDUV HQKDQFHG FRUSXV PDQDJHPHQW DQG TXHU\LQJ DV ZHOO DV DQQRWDWLRQ DQG SURFHVVLQJ DUH EHLQJ GLVFXVVHG DW WKH HQG
منابع مشابه
Generation of Verbal Stems in Derivationally Rich Language
The paper presents a procedure for generating prefixed verbs in Croatian comprising combinations of one, two or three prefixes. The result of this generation process is a pool of derivationally valid prefixed verbs, although not necessarily occuring in corpora. The statistics of occurences of generated verbs in Croatian National Corpus has been calculated. Further usage of such language resourc...
متن کاملBuilding the Croatian Dependency Treebank: the initial stages
The paper presents the work-in-progress of building the Croatian Dependency Treebank. Its design principles, procedures and the pilot corpus used within are described. Perspectives for further development of the Croatian Dependency Tree-bank are presented at the end.
متن کاملBuilding the Croatian-English Parallel Corpus
The contribution gives a survey of procedures and formats used in building the Croatian-English parallel corpus which is being collected in the Institute of Linguistics at the Philosophical Faculty, University of Zagreb. The primary text source is newspaper Croatia Weekly which has been published from the beginning of 1998 by HIKZ (Croatian Institute for Information and Culture). After quick su...
متن کاملEnlarging the Croatian Morphological Lexicon by Automatic Lexical Acquisition from Raw Corpora
This paper presents experiments for enlarging the Croatian Morphological Lexicon by applying an automatic acquisition methodology. The basic sources of information for the system are a set of morphological rules and a raw corpus. The morphological rules have been automatically derived from the existing Croatian Morphological Lexicon and we have used in our experiments a subset of the Croatian N...
متن کاملImproving Chunking Accuracy on Croatian Texts by Morphosyntactic Tagging
In this paper, we present the results of an experiment with utilizing a stochastic morphosyntactic tagger as a pre-processing module of a rule-based chunker and partial parser for Croatian in order to raise its overall chunking and partial parsing accuracy on Croatian texts. In order to conduct the experiment, we have manually chunked and partially parsed 459 sentences from the Croatia Weekly 1...
متن کامل